巴西专利BR102013010877B1 load-store dependency predictor content management method and processor

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
LOAD STORAGE DEPENDENCE PREDICTOR CONTENT MANAGEMENT. The present invention relates to methods and apparatus for managing load-storage dependencies on an out-of-order processor. A load-store dependency predictor can include a table to store entries from load-store pairs that are perceived to be dependent and carry out an out-of-order execution. Each entry in the table includes a counter, to indicate a strength of the dependency prediction. If the counter is above a threshold, a dependency will be applied to the load-store pair. If the counter is below threshold, the dependency will not be applied to the load-store pair. When a store is sent, the table is searched, and any matching entries in the table will be armed. If a load is sent, corresponding to an armed entry, when the counter is over the limit, then the load must wait for its issuance until the corresponding storage is issued.
公开号:BR102013010877B1
申请号:R102013010877-4
申请日:2013-05-02
公开日:2021-07-06
发明作者:Stephan G. Meier；John H. Mylius；Gerard R. Williams Iii；Suparn Vats
申请人:Apple Inc.；
IPC主号:

专利说明:

Field of Invention
[0001] The present invention relates generally to processors and, in particular, to methods and mechanisms for managing load-storage dependencies on processors. Description of Related Technique
[0002] Superscalar processors attempt to achieve high performance by issuing and executing multiple instructions per clock cycle and employing the highest possible clock frequency consistent with the design. One way to increase the number of instructions executed per clock cycle is to perform an out-of-order execution. In an out-of-order execution, instructions may be executed in an order other than that specified in the program sequence (or program order).
[0003] Some processors can be as aggressive as possible to program instructions out of order and/or speculatively in an attempt to maximize the performance gain realized. For example, it may be desirable to schedule memory load operations before older memory store operations, since memory load and memory operations more typically have dependent instructions. However, in some cases, a memory load operation may depend on an older memory store operation (e.g, the memory store operation updates at least one byte accessed by the memory load operation). In such cases, the memory load operation will be incorrectly executed if executed before the memory store operation. If a memory load operation is performed before an older dependent memory store operation, the processor may need to be unloaded and redirected, which degrades processor performance.
[0004] An operation will be older than another operation if the operation is earlier than another operation in program order. An operation will be more recent than another operation if it follows another operation in program order. Similarly, operations can be indicated as being before or after other operations, or considered to be previous operations, preceding operations, following operations, etc. Such references can refer to the program order of the operations. Furthermore, "memory load operation" or "load operation" refers to the transfer of data from a memory or cache to a processor, and the "memory store operation" or "store operation" refers to the transfer of data from a processor to a memory or cache. Load operations and store operations can be more succinctly referred to as "load" and "store" respectively.
[0005] While the dependencies between loads and stores are dynamic, mechanisms for preventing these events are typically static in nature. Therefore, in an effort to prevent an out-of-order violation for a store-load pair, the processor is most likely overcompensated and not aggressively programmed out of order. In this case, the processor forces instructions in an unnecessary order. If a dependency is no longer required, but is nevertheless still being forced, then memory level parallelism is reduced, and processor efficiency decreased. summary
[0006] Systems, apparatus, processors, and methods for predicting load/storage dependencies are contemplated. A processor may include at least a send unit, a load-storage dependency predictor, and a reserve station. When an ordering violation between a newer load and an older dependent store is detected, this constitutes a training event for the load-store dependency predictor. After the load-storage pair has been trained, the predictor can add dependency to the load the next time the load passes through the sending unit. This added dependency indicates that loading should not be scheduled outside of the reserve station until storage has been scheduled.
[0007] In one embodiment, a predictor table can be used to store dependent discoverable load-store pairs. When a newer load issues before older storage that shares an address dependency, an entry may be allocated in the predictor table, and, in one embodiment, the entry may be associated with at least a portion of a program counter of storage (Program Counter PC) for storage and at least a portion of a PC storage value for dependent loading. Each entry in the predictor table can also include a counter field, and the counter field can represent the strength of the data dependency prediction for that particular load-store pair. The counter field allows predicted dependencies to be turned off when they get old or expire.
[0008] The value of the counter field can also affect the substitution policy for the predictor table. A replacement pointer may be constantly scanning predictor inputs and looking for inputs with low counter values. In one modality, each time the predictor table is accessed, the replacement pointer can advance. When the replacement pointer finds an entry with a counter value equal to zero, then the pointer can stop at that entry. When a new input is allocated to a new dependent load-storage pair, then the existing input with the zero counter indicated by the pointer can be used for the new input.
[0009] These and other aspects and advantages will be apparent to those skilled in the art, in view of the following detailed descriptions of the solutions presented herein. Brief Description of Drawings
[0010] The above advantages and further advantages of the methods and mechanisms will be better understood with reference to the following description, in connection with the accompanying drawings, in which:
[0011] FIG. 1 illustrates a modality of an integrated circuit portion;
[0012] FIG. 2 is a block diagram illustrating a processor core modality;
[0013] FIG. 3 is a block diagram illustrating a mode of mapping/sending unit and reserve station;
[0014] FIG 4 illustrates a mode of load-storage dependency predictor table;
[0015] FIG. 5 is a block diagram illustrating a mode of representation of counter values used in a load-storage dependency predictor table;
[0016] FIG 6 is a generalized flow diagram illustrating an embodiment of a method for processing a load-up operation;
[0017] FIG 7 is a generalized block diagram illustrating an embodiment of a method for adjusting a load-storage dependency prediction force indicator;
[0018] FIG 8 is a generalized flow diagram illustrating an embodiment of a method for replacing entries in a load-store dependency predictor table;
[0019] FIG 9 is a block diagram of an embodiment of a system;
[0020] FIG. 10 is a block diagram of an embodiment of a computer readable media. Detailed Description of Modalities
[0021] In the descriptions that follow, numerous specific details will be set forth to provide a complete understanding of the methods and mechanisms presented herein. However, those skilled in the art should recognize that various modalities could be practiced without such specific details. In some cases, well-known structures, components, signals, computer program instructions, and techniques will not be shown in order not to obscure the solutions described herein. It should be appreciated that, for simplicity and clarity of illustration, the elements shown in the figures are not necessarily drawn to scale. For example, the dimensions of some of the elements may be shown exaggerated in relation to other elements.
[0022] This specification includes references to "a modality". The appearance of the term "in a modality" in different contexts does not necessarily refer to the same modality. Particular features, structures, or features may be combined in any suitable manner consistent with the specification. Furthermore, as used throughout the specification, the word "may" is used in a permissive sense (i.e. meaning "have the potential to") rather than obligatory (meaning "should"). Similarly, the words "includes", "including", "includes" means "including, but not limited to".
[0023] Terminology: The paragraphs below provide definitions and/or contexts for terms found in the specification (including embodiments).
[0024] "Understanding": This term is an open term. As used in the embodiments, this term does not exclude additional structures or steps. With respect to a load-storage dependency embodiment, such embodiment does not exclude that the processor may include additional components (e.g. cache, fetch unit, execution unit).
[0025] "Configured for": Various units, circuits, and other components may be described or claimed as configured to perform one task or a number of tasks. In such contexts, "Configured for" is used to connote a structure indicating which units/circuits/components include a structure (eg circuitry) that performs a task or a number of tasks during operation. Thus, it can be said that the unit/circuit/component is configured to perform the task, even when the unit/circuit/component is not currently operational (not under this umbrella). Units/circuits/components used with "configured to" include, for example, circuits, memory storing executable program instructions to implement an operation, etc. Declaring that a unit/circuit/component is "configured" to execute one or more tasks, expressly intends not to invoke 35 USC §112, sixth paragraph, for that unit/circuit/component. Additionally, the term "configured for" may include a generic structure (eg, generic circuitry) operated by software or firmware (eg, an FPGA or general purpose processor that runs the software) to operate in such a way that it is capable of perform the task(s) in question. The term "configured to" may also include adapting a manufacturing process (eg, a semiconductor manufacturing facility) to fabricate devices (eg, integrated circuits adapted to implement or perform one or more tasks.
[0026] "Based on": As used herein, this term is used to describe one or more factors that affect a determination. This term does not exclude additional factors that could affect a determination. That is, a determination can be based only on these factors or at least partially on these factors. With respect to the phrase "determine A on the basis of B", while B may be a factor affecting the determination of A, such a phrase does not preclude that the determination of A may also be based on C. In other cases, A may be determined based only on B.
[0027] Referring now to FIG. 1, a block diagram illustrating one embodiment of a portion of an integrated circuit is shown. In the illustrated embodiment, IC 10 includes processor complex 12, memory controller 22, and memory physics interface circuits (PHYs) 24, 26. It should be noted that IC 10 may also include many other components not shown in FIG. In various embodiments, IC 10 may also be called System on Chip (SoC for "System on Chip"), Application Specific Integrated Circuit (ASIC for Application Specific Integrated Circuit) or an apparatus.
[0028] Processor complex 12 may include central processing units (CPUs) 14, 16, level two cache (L2) 18, and bus interface unit (BIU) 20. In other embodiments, processor complex 12 may include other numbers of CPUs. CPUs 14, 16 are also called processors or cores. The CPUs 14, 16 can be coupled to the L2 cache 18. The L2 cache 18 can be coupled to the BIU 20, which can be coupled to the memory controller 22. Other modalities may include additional levels of cache (e.g., level three cache ( L3)). It should be noted that processor complex 12 may include other components not shown in FIG 1.
[0029] CPUs 14, 16 include circuitry to execute instructions defined in an instruction set architecture. Specifically, one or more programs comprising instructions can be executed by CPUs 14, 16. Any exemplary instruction set architecture can be implemented in various embodiments. For example, in one modality, PowerPC® instruction set architecture can be implemented. Other exemplary instruction set architectures may include ARM® instruction set, MIPS® instruction set, SPARCTM instruction set, x86 (or IA-32) instruction set, IA-64 instruction set, etc.
[0030] In various embodiments, CPUs 14, 16 can execute instructions out of order, which, in some cases, cause ordering violations. For example, in the case of load-store instructions, ordering violations can occur when performing a newer load before older stores, overlapping physical addresses. To prevent or prevent replay of this type of ordering violation, several techniques can be employed to prevent a newer load from running before an older store on which it depends. In one embodiment, each of the CPUs 14, 16 may include a load-storage dependency predictor to keep track of load-storage pairs that are predicted or expected to be dependent, and also tend to run out-of-order. of-order. In one modality, dependent load-store pairs can be written to a table.
[0031] Sometimes the predictor may train in a load-storage pair, but dependency can be an exceptional case. This is because the dependency between load and store instructions can be address-based, and load and store instruction addresses can change over time. In other words, load-storage dependencies can be dynamic. Some of the entries in the table may not be accurate after a while, and applying dependencies to imprecise entries can cause the processor to unnecessarily delay load operations without any benefit.
[0032] To prevent stale entries from piling up in the table, and dependencies from being applied to load-storage pairs that match old entries, each table entry can include an indicator representing the strength of the dependency prediction. The indicator can determine whether a dependency is applied for a given load-storage pair. The bookmark can also affect the replacement policy with respect to table entries, so that entries with low bookmark values can be replaced when a new entry is allocated in the table.
[0033] Each of the CPUs 14, 16 can also include an L1 level one cache (not shown) and each L1 cache can be coupled to the L2 cache 18. In one embodiment, the L2 cache 18 can be configured to store instructions and data for low latency access by CPUs 14, 16. L2 cache 18 can comprise any capacity and modality (eg, direct mapped, associative tuned). The L2 cache can be coupled to the memory controller 22 via the BIU 20. The BIU 20 can also include various other logical structures to couple CPUs 14, 16 and L2 cache 18 to various other devices and blocks.
[0034] The memory controller 22 may include any number of memory ports and circuitry configured to interface the memory. For example, the memory controller 22 can be configured to interface dynamic random access memory (DRAM), such as synchronous DRAM (SDRAM), double data rate SDRAM (DDR), DDR2 SDRAM, Rambus DRAM (RDRAM), etc. .. The memory controller 22 can also be coupled to physical memory interface circuits (PHYs) 24, 26. Memories PHYs 24, 26 are representative of any number of memory PHYs that can be coupled to the memory controller 22. The memories PHYs 24, 26 can be configured to interface memory devices (not shown).
[0035] It should be noted that other embodiments may include or combinations of components including subassemblies or superassemblies of the components shown in FIG 1, and/or other components. While a case of a given component can be shown in FIG 1, other modalities can include two or more situations of the given component. Similarly, throughout this detailed description, two or more situations of a given component can be included even if only one of them is shown, and/or modalities including only one situation can be used, even if multiple situations are shown.
[0036] Turning now to FIG 2, an embodiment of a processor core is shown. Core 30 is an example of a processor core and core 30 can be used in a processor complex, such as processor complex 12 of FIG. 1. In one embodiment, each of the CPUs 14, 16 of FIG 1 may include the component and functionality of the core 30. The core 30 may include a seek and decode unit (FED) 32, send and map unit 36, memory management unit (MMU) 40, core interface unit (CIF) 42, execution units 44, and load-storage unit (LSU) 46. It should be noted that the core 30 may include other components and interfaces not shown in FIG 2.
[0037] The FED 32 unit may include circuitry configured to read instructions from memory and place them into an L1 34 instruction level cache. The L1 34 instruction cache may be a memory cache for storing instructions to be executed by the kernel 30. The L1 instruction cache 34 can have any capability and construct (eg, direct mapped, associative fit, fully associative, etc.). Furthermore, the instruction cache L1 34 can have any pipeline size. The FED 32 unit may also include branch prediction hardware configured to predict branch instructions and lower the predicted trajectory. The FED 32 unit can also be redirected (eg via prediction error, exception, interrupt, flow, etc.).
[0038] The FED 32 unit can also be configured to encode instructions into instruction operations (ops). Generally, an instruction operation can be an operation that the hardware included in execution units 44 and LSU 46 is capable of performing. Each instruction can be converted into one or more instruction operations, which, when executed, provide the performance of the operations defined for that instruction, according to the instruction set architecture. The FED 32 unit can be configured to decode multiple instructions in parallel.
[0039] In some modes, each instruction can be decoded in a single instruction operation. The FED 32 unit can be configured to identify instruction type, source operands, etc., and each decoded instruction can comprise the instruction along with some of the decoding information. In other modalities, in which each instruction converts to a single op, each op may simply be the corresponding instruction or a portion of it (for example, the instruction's opcode field(s). In In some embodiments, the FED unit 32 can include any combination of circuitry and/or microcode to generate ops for instructions. For example, relatively simple op generations (eg, one or two peri-instruction ops) can be handled in hardware, while more extensive op generations (eg, more than three peri-instruction ops) can be handled in microcode. In other embodiments, the functionality included in the FED 32 unit can be split into two or more separate units, such as search unit and /or other units.
[0040] Decoded ops can be provided to mapping/sending unit 36. Mapping/sending unit 36 can be configured to map ops and architecture registers to core physical registers 30. Mapping/sending unit 36 can im- Supplemental register renaming to map source register addresses from ops to source operand numbers identifying the renamed source registers. Mapping and sending unit 36 may also be configured to send ops to reserve stations in execution units 44 and LSU 46. Mapping and sending unit 36 may include load-storage dependency (LSD) predictor 37 and buffer. reordering (ROB) 38. Before being sent, ops can be written to the ROB 38. The ROB 38 can be configured to hold ops until the ops are provided in an order. Each op can be assigned a ROB index (RNUM) that corresponds to a specific entry in ROB 38. RNUMs can be used to track ongoing operations on core 30. Mapping and sending unit 36 can also include other components ( eg mapper arrangement, send unit, send buffer) not shown in FIG 2. Furthermore, in other embodiments, the functionality included in the mapping and sending unit 36 can be divided into two or more separate units, such as a mapping unit, sending unit and/or other units.
[0041] The LSD predictor 37 can be configured to train and predict load-store instruction pairs, whose emission is likely to be out-of-order. The LSD predictor 37 may include a table with entries for trained load-store pairs, and each input may include information identifying load and store instructions and predictive strength. In one modality, a training event can be an ordering violation triggered by performing a newer load before an older store with overlapping physical addresses. In one embodiment, the table can be an entirely associative structure of 256 entries. In other modalities, the table can have other numbers of entries. In various embodiments, the table can be a content addressable memory (CAM) for several fields in the table.
[0042] When there is an ordering violation between dependent load and store operations, the core 30 can be redirected and resynchronized. Various corrective actions can be taken as a result of the redirect. At that point, training can be performed for the load-storage pair that caused the resynchronism. An input for that particular pair can be allocated in the LSD 37 predictor, and the prediction strength can be set to a high level. Then, in a next step through the channeling of the core 30, when the storage of the load-storage pair is sent from the unit 36, the LSD predictor 37 can be investigated with respect to the storage. The corresponding entry can be found and armed. When the load from the load-storage pair is sent from unit 36, an LSD predictor 37 search can be made for the load, and the load must match at the armed input. Then, the shipment can be sent to a reserve station with a dependency, causing the shipment to wait in storage before being issued from the reserve station.
[0043] The LSD predictor 37 can be configured to clear the table if the storage that armed an input was unloaded from the instruction pipeline before the storage was issued. For example, there may be a scenario where there is a need to disarm an armed LSD 37 predictor input, such as when there is a fault. A load operation can be dependent on and wait for a store operation already unloaded, and this can end up blocking the core 30. In that case, when a store operation is unloaded from the core 30, the LSD predictor table 37 can be investigated with respect to any armed inputs that correspond to the unloaded storage. Any entries found for unloaded storage can be disarmed. In one embodiment, each LSD 37 predictor input may include a storage RNUM to identify the specific storage of the load-storage pair.
[0044] Execution units 44 can include any number and type of execution units (eg, floating point, vector). Each of the execution units 44 may also include one or more reserve stations (not shown). CIF 42 can be coupled to LSU 46, FED unit 32, MMU 40, and L2 cache (not shown). The CIF 42 can be configured to manage the interface between the core 30 and the L2 cache. The MMU 40 can be configured to perform memory management and address translation functions.
[0045] LSU 46 may include L2 data cache 48, reserve stations 50 and 52, and store queue 54 and load queue 56. Load-store operations may be sent from mapping/sending unit 36 to storage stations. reserve 50, 52. Other modalities may include other reserve station numbers. Transactions can be issued from reserve stations 50 and 52 out of order. Storage queue 54 can store data corresponding to store operations, and load queue 56 can store data associated with load operations. LSU 46 can also be coupled to L2 cache via CIF 42. It should be noted that LSU 46 can also include other components (eg log file, prefetch unit, translation lookaside buffer) not shown in FIG 2.
[0046] A load-storage order violation can be detected by the LSU 46 at the time the oldest store is issued. In one embodiment, the store address of the oldest store may be compared to all newest loads in load queue 56. If a match is detected, then the load operation may have already completed with an incorrect data. This may be fixed in the future by signaling a redirect back to the mapping/sending unit 36 using RNUMs from the load and store operations. The mapping/sending unit 36 can download the instructions from the core channel 30 and redirect the front end of the core back to the load instruction address, and can fetch the load instruction again. . To prevent future redirects, the mapping/sending unit 36 can predict and write load and store dependencies to LSD predictor 37, and communicate the predicted dependencies to reserve stations 50, 52.
[0047] In a typical case, when a store is sent, the store can investigate the LSD predictor 37, and then if a match is found for the store, then the match entry in the table can be armed (ie activated) , and the storage RNUM can be written to the input. Subsequently, the load can be submitted and the search for loads in the table can be done. In one embodiment, the identification values used in searching the LSD predictor 37 may be at least a portion of PC load and store values. In another embodiment, the identification values used in the search and stored in the entries may be erroneous values, derived from at least a portion of PC values and at least a portion of the architecture register values, and at least a portion of micro values. -op. Other possibilities of identifiers that can be used are possible and contemplated.
[0048] In multiple modes, the load can match on any number of inputs in the LSD 37 predictor. In one mode, for a match to occur, the input needs to be armed. If the load matches an armed entry, then dependency with the RNUM of the store can be created by linking the RNUM of the armed store to the load. The load can be marked as waiting for that particular storage RNUM to be issued from the reserve station. At reserve stations there may be a dependency field for loading, and the load can be marked as dependent on a given store to be issued from one of the reserve stations 50, 52. Thus, in that case, the load can be marked as waiting for a specific storage RNUM and the load may cycle after the specific storage is issued.
[0049] If the load corresponds to multiple storage entries, such a case can be called multi-match. In this case, the upload can wait until all older stores are issued. For example, in one modality, a bit can be set so that the load waits for all older stores to be issued before the load is issued. This forces all older stores to be issued from reserve stations 50, 52 before loading. In one embodiment, each of the reserve stations 50, 52 can make available the oldest storage it contains. Once the load becomes older than both stores, then the load can be issued.
[0050] Each reserve station 50, 52 may include a selector that is configured to issue any valid operations. When a store is valid, and being selected and issued, a token can be transmitted, and then the load dependent on that store will match that token. This marks the charge as eligible for issuance from reserve stations. In other words, the storage produces a markup consumed by the load. In one modality, the tag can be the RNUM of the storage. In one embodiment, the RNUM is a 9-bit value, although in other embodiments, the size of the RNUM may vary. A load with dependency can have an extra source stored with the load at the reserve stations, which can be the RNUM of the store from the same entry in the LSD 37 predictor.
[0051] When a load matches an input in LSD predictor 37 and the input is armed, it means that there is a valid store that the load needs to wait. The input can also include an indicator for the strength of the prediction. In one embodiment, the indicator may be a counter, and, if the value of the counter is above a threshold, then the input may be considered to be a strong prediction, i.e. highly probable, and the load-storage dependency may be established. The limit value varies from modality to modality. If the load matches on an armed input, and the indicator is weak, indicating not using the prediction, then a dependency may not be established for the load. If the load-storage dependency is established, then the load can select the RNUM from the store so that RNUM is read out of the input and passes along reserve stations with the load when the load is shipped. Charging can also be marked as dependent at the booking station.
[0052] In one embodiment, a store issued from a reserve station causes a tag to be transmitted only if the store is tagged as a valid producer. When a store searches for predictor LSD 37 and a match is not found, then the store will not be established as a valid producer. If the storage finds a valid entry in the LSD predictor 37 and the prediction strength indicator indicates that the load-storage dependency predictor is above a threshold (i.e. the prediction is enabled) then the input can be armed. In one modality, if the predictive strength indicator is below a threshold, then the storage does not arm the input, even if the storage matches that storage input. In some embodiments, the input is armed when the store finds a match, regardless of the value of the predictive strength indicator. Storage can match across multiple entries, and multiple entries can be armed for a single storage.
[0053] When a load matches an armed input of the LSD predictor 37, the load is marked as dependent, and the load can wait to be issued from the reserve station until the corresponding load is issued from the reserve station. Then, once the issue of the load with dependency is established, it can be determined where the load receives its data from. Depending on where the load receives the data, the predictive strength indicator on the corresponding LSD predictor 37 input can be increased, decreased, or maintained.
[0054] For example, if load data was forwarded from storage queue 54, then the prediction from LSD predictor 37 can be considered good. In that case, the data from storage is not yet provisioned to cache 48, so it is advantageous for the upload to wait for storage. If the load data for that load operation is still in store queue 54, this could indicate that there is actually a dependency between load and store. In other words, the data needs to be routed from storage queue 54 to the dependent load.
[0055] If a load data storage queue 54 is missing, then the dependency is no longer valid. It is possible that there is a previous dependency, but then the load or store address has changed and the load and store no longer match. In this case, if the storage data is retrieved from cache 48, then the data may have been stored in cache 48 for a long period of time. Therefore, determining whether storage data was forwarded from storage queue 54 or cache 48 can indicate whether the prediction was accurate (or not). Furthermore, the predictive strength indicator stored in the corresponding input of the LSD 37 predictor can be updated based on this determination. If the accuracy is accurate, and the load data forwarded from the storage queue 54, then the predictive strength indicator can be increased. If the load data comes from cache 48, the predictive strength indicator can be lowered. In other embodiments, other techniques can be used to determine if the dependency prediction is accurate.
[0056] It should be understood that the distribution of functionality illustrated in FIG 2 is not the only possible microarchitecture that can be used for a processor core. Other processor cores may include other components, omit one or more of the components shown, and/or include a different arrangement of functionality among the components.
[0057] Referring now to FIG. 3, a block diagram of a mapping/sending unit embodiment and reservation station shown. In one embodiment, the mapping/sending unit 60 may include a register mapper 62, reorder buffer (ROB) 64, load-storage dependency predictor (LSD) 66, and send unit 68. register mapper 62 and LSD predictor 66 are coupled to receive ops from a decoder unit (not shown). The LSD 66 predictor is coupled to receive PCs from the decoder unit and coupled to receive Redirection and Count Update signals from the load-storage unit (not shown). The LSD 66 predictor is also coupled to a replacement pointer that polls the LSD 66 predictor for entries that may be discarded when a new entry is allocated.
[0058] Register mapper 62 may be configured to map architecture registers with respect to physical registers and provide physical register addresses to sending unit 68. Sending unit 68 may be configured to send operations to reserve stations 70A to 70N. Sending unit 68 may be configured to maintain a free list of reserve station entries at reserve stations 70A to 70N and may generally assign entries to ops to balance loading between reserve stations 70A to 70N.
[0059] The LSD 66 predictor can be configured to check stores and loads across operations and compare the PCs of any detected store and loads to the store and load PCs that previously caused ordering violations and allocated entries in the training table. If a PC matches a given storage, the LSD predictor 66 can be configured to arm the corresponding training table entry. In one embodiment, the LSD 66 predictor can check the strength of the predictor indicator before arming the input. If the indicator is above the limit then the entry can be armed, or instead if the indicator is below the limit then the entry may not be armed. Additionally, LSD predictor 66 can be configured to capture the RNUM assigned to storage as the storage identifier.
[0060] When a load corresponding to the armed input is detected, and the strength of the prediction indicator of the armed input is above the threshold, the LSD predictor 66 can be configured to use the storage identifier to generate a dependency for the load in storage, preventing the issue of the charge by the reserve station 70 until after the issue of storage. In one embodiment, the LSD predictor 66 may be configured to forward the RNUM from storage to a given reserve station 70 along with an indicator indicating the load has a dependency. Additionally, if there are multiple matches for the load, then LSD predictor 66 may forward a multi-match indicator to a given reserve station 70. In other embodiments, LSD predictor 66 may be configured to forward multiple storage RNUM in case of multi-match to the reserve station 70, and the reserve station 70 can be configured to store more than one RNUM of store-per-load. Other modalities may indicate storage dependencies in other ways.
[0061] Reserve stations 70A to 70N are representative of any number of reserve stations, which can be used as part of loading/storage units (not shown) and/or execution units (not shown). Each reserve station 70A to 70N can be configured to store operations until the operations are performed by a corresponding functional unit. An example of entering reservation station 70A in accordance with an embodiment is shown in FIG. 3. Each of the reservation stations 70A to 70N may include multiple entry numbers, depending on the modality. Each entry can include a dependency indicator, multi-match indicator, store RNUM of a dependency, a load/store (L/S) indicator to indicate whether the operation is a load or store, and PC of an operation. In other embodiments, the input may include other fields (ie source record, destination record, source operands) and/or omit one or more of the fields shown in FIG 3. In addition, other types of input (eg, floating point ) may be formatted differently.
[0062] LSD predictor 66 can be configured to indicate load/store pairs that cause ordering violations based on redirect indication. Redirection indication may include load and store PCs or other load and store indicators. The LSD predictor 66 thus can be trained by stores and loads that cause ordering violations to prevent such events in the future when the same sequence is reused and rerun in the processor.
[0063] Register mapper 62 may include a memory with an entry for each logical register. The entry of each logical record in record mapper 62 can store the RNUM of the most recent operation to update the logical record. Additional status can be stored in the rename map entries in the same way. For example, a bit can indicate whether the most recent op was performed (or not). In such an embodiment, register mapper 62 can receive signals from a given reserve station 70 identifying the ops issued, which can prevent the mapper 62 from updating the bit. A bit indicating whether or not the most recent op was unop can also be included.
[0064] It should be noted that all connections of the units shown in FIG 3 are illustrated, and the mapping/sending unit 60 includes additional circuitry implementing operations (not shown). For example, register mapper 62 and ROB 64 may receive redirect indications to adjust their mapping to account for the ops being offloaded. Additionally, register mapper 62 and ROB 64 can receive an indication and withdraw ops to adjust their states to withdrawal (eg, releasing entries to designate new operations, update architecture, rename state, etc.). These operations are auxiliary to the operation of the LSD 66 predictor and therefore will not be described in more detail here.
[0065] It should be noted that while PCs and RNUMs are used as storage identifiers and PCs as shipment identifiers, other modalities may use any identifiers that uniquely identify instructions in progress on the processor (for example, any type of marking or sequential number).
[0066] Turning now to FIG 4, an embodiment of the load-storage dependency predictor table is shown. Table 90 can include various numbers of entries depending on the modality. Each entry can correspond to a load-storage pair which is predicted to have overlapping addresses and number out of order. An entry may be placed in table 90 in response to detecting a sorting violation. If the ordering violation occurs, a storage queue can unload the processor including the load that caused the violation, back to the fetch unit and table 90 can be trained on that violation, so that an entry of that specific load-storage pair is added to table 90. Typically, the unloaded storage that triggered the redirect will have already been issued, so when you rescan and decode the unloaded load, the entry in table 90 will not be armed, and the load can be issued normally. In future executions of storage on that PC, the storage arms the corresponding entry in table 90 and prevents the issue from loading until the issue of storage.
[0067] Table 90 can be configured to allow multiple simultaneous accesses and updates by multiple ops. Furthermore, while table 90 is illustrated as being an integrated table, the different fields may also be separate tables which correspond to separate memories with associated table entries. For example, load PCs can be a separate table, storage PCs can be a separate table, and a load PC entry can correspond to a storage PC entry, for which a load ordering violation- Specific storage was detected and trained.
[0068] Each entry may include a valid indicator 92, which may indicate whether the entry is a valid entry, and whether the entry should be used to enforce a load-storage dependency indicated by the entry. In one mode, the valid indicator 92 can be released on reset. A valid indicator 92 can also affect the replacement policy so that invalid entries can be the first entries that are replaced when new entries are allocated. In some embodiments, the valid indicator 92 cannot be included in entries in table 90. Instead, in those embodiments, the value of counter field 102 can be used to indicate whether the entry is valid. Other modalities can exclude counter field 102 in the table and use only the valid indicator 92.
[0069] Each entry may also include a PC storage value 94 to determine the specific storage operation. In some embodiments, PC storage value can be combined with architecture registers and/or disabled. When a storage is sent, the storage PCs of table 90 can be polled with respect to the storage PC sent. Table 90 can be a CAM to PC storage field, where each memory entry includes a circuit for comparison. The PC storage field can also be a set of registers and comparators that are operated like CAM. If a storage sent matches on any input, then those inputs can set bit set 98. The storage RNUM can also be written into the storage RNUM field 96 of the input. When a store is issued from a reserve station, then the set bit 98 can be released from any table 90 entries previously armed by that particular store.
[0070] When a shipment is sent, the load PC value 100 of each entry in table 90 can be searched with respect to the PC of the shipment sent. Table 90 can be a CAM for PC loading field. If a sent load matches any armed input, then a dependency can be established and applied for that specific load. If the load corresponds to a disarmed input, then the dependency will not be established because the corresponding store has not been sent or has already been issued, and therefore the ordering violation should not occur. If the charge matches on multiple armed entries, then the charge can wait until all the oldest stores have been issued before the charge itself is issued. If the load matches a single armed input, then the store's RNUM can be written to the reserve station with the load. There may also be a dependency bit set for the load at the reserve station, to indicate that the load has a dependency valid.
[0071] Each entry may include a counter field 102. The value of counter 102 may indicate the strength of prediction for that particular load-store pair at the entry. In one embodiment, the counter can be a two-bit top-down counter. In another modality, the counter can use other numbers of bits. Furthermore, counter 102 can be configured to saturate at its maximum and minimum values.
[0072] When storage matches on an input, the value of counter 102 can be checked before arming the input. If the value of counter 102 is below a threshold, then the input may not be armed. If the value of counter 102 is over the limit, then the input can be armed. In some embodiments, the input can be armed without checking the value of counter 102. When the load matches on an input, the value of counter 102 can also be checked. Only if the value of counter 102 is over the limit, that dependency can be applied. The threshold value may vary depending on the modality and be adjusted according to specific operating conditions.
[0073] In another modality, expired counters can be used with entries from table 90. Each entry can include an expired counter, and the expired counter can be set to an initial value when the entry is first allocated. An interval counter can also be used to count a programmable interval, and when the interval counter expires, each expired counter in table 90 can be decremented. The interval counter can then start and count a programmable interval. Each time the interval expires, each expired counter in table 90 can be decremented. Each time an entry is accessed/armed by a load-store pair, the expired counter can be incremented by a fixed amount. If an entry in table 90 goes unused, then eventually its expired counter goes to zero, where the entry can be replaced with a new entry.
[0074] In other embodiments, table 90 may include additional fields and/or omit one or more fields shown in FIG 4. Furthermore, table 90 may be formatted differently in other embodiments.
[0075] Referring now to FIG. 5, an embodiment of the representation of corresponding counter values in load-store pair entries in a predictor table is shown. This designation of counter values is represented by a two-bit counter in table 110. In other embodiments, other numbers of bits may be used by the counter.
[0076] In one mode, a counter value of "11" or three may represent "strongly enabled". For an entry with this counter value, a dependency for the load-store pair can be applied. A counter value of "10" or Two can represent "weakly enabled". If an entry is "weakly enabled", then the dependency will also pass. A counter value of "01" or Um may represent "weakly enabled". If an entry is "weakly enabled", then the dependency will not be applied to the corresponding load-storage pair. A counter value of "00" or zero can represent "strongly enabled". In some modalities, "strongly enabled" may also indicate that the entry is invalid. The limit in the mode shown in FIG 5 is between Two and One. In other modes, the limit can take on other values.
[0077] In one mode, when first an input is allocated, by default, the counter for the new input can be set to "weakly enabled". When the counter is "weakly disabled" (counter = 1), then the load-storage pair that matches the input cannot have an established dependency. Instead, the upload can be issued without dependency. In other embodiments, other sizes of counters can be used and counter values can have different representations.
[0078] Turning now to FIG 6, an embodiment of a method for processing a loading operation is shown. For discussion purposes, the steps in this modality are shown in sequential order. It should be noted that, in various modalities of the method, one or more of the elements described may be performed concurrently, in a different order from that shown. Other additional elements can also be carried out as desired. Additionally, sections of the flowchart can be performed in parallel to simultaneously process multiple load operations.
[0079] In one operation, a load operation can be received by a mapping/sending unit (block 120). The load operation may have been decoded at an earlier stage of a processor pipeline. The load-store dependency predictor table can be searched with respect to entries with the same PC, such as load operation (block 122). After performing the search, you can determine how many matches were found (conditional block 124). If no match is found (conditional block 124), then the shipment can be sent to a backup station without dependency (block 126). Loading can match unarmed entries, but these unarmed matches are not actual matches and require a dependency to apply. Similarly, if there is a load match on an armed input, but the prediction indicator counter is below a threshold, then this is not an actual match, and therefore a dependency will not be applied. In some arrangements, the counter does not need to be compared against the load limit if the store has already verified the counter before arming the entry.
[0080] If the load does not have a dependency that needs to be applied, then this can be indicated in a variety of ways. For example, in one modality, a dependency bit can be released to indicate that the load has no dependency. After block 126, the selector can select the load to be issued from the reserve station at any time, without waiting for any other operation to be issued (block 132).
[0081] If a single mail with armed entry is found, then the shipment can be sent to a reserve station with dependency (block 128). The RNUM of the corresponding storage can be written into the reserve station input with the load. In one modality, for an entry to be considered a match, the entry's counter field may need to be above a threshold. For example, if the load matches an armed input, but the counter field for input is below a threshold (i.e. "weakly enabled" or "strongly disabled"), then this may not constitute a real match. After block 128, the load may wait for emission until the corresponding storage, on which it depends, is emitted (block 134).
[0082] If multiple matches with armed entries are found for the load (conditional block 124), then the load can be sent to the reserve station with a multi-match indicator set (block 130). Then, the load can wait to be issued from the reserve station, until all older stores are issued (block 136). The load-storage unit can include multiple reserve stations, and each reserve station can be configured to keep track of the oldest storage among its entries. When the load with multiple matches is sent, the oldest storage at each reserve station can be recorded and after the oldest storage at each reserve station is issued then the load can be issued one cycle later.
Turning now to FIG 7, an embodiment of a method for setting a load-storage dependency prediction strength indicator is shown. For discussion purposes, the steps in this modality are shown in sequential order. It should be noted that, in various embodiments of the method described below, one or more of the elements described may be performed concurrently in a different order than shown or omitted entirely. Other additional elements can also be carried out as desired.
[0084] A load with dependency can be issued from a reserve station (block 140). The load may be delayed in issuing until after the corresponding storage of the load-storage pair has been issued. The corresponding storage can be issued from the same reserve station or from a different reserve station. After the load has been issued from the reserve station and executed, it can be determined where the load data is taken from (block 142).
[0085] If the load data is in a load queue (conditional block 144), then the dependency prediction for that particular load-store pair can be considered good, and the counter of the corresponding entry in the dependency predictor of load-storage can be incremented (block 146). If there is a fault in the storage queue for the load data (conditional block 144), then a dependency on storage cannot be guaranteed for the load (ie the dependency prediction is no longer valid) and the counter of the corresponding entry in the predictor of load-storage dependency can be decremented (block 148). This method can be performed in parallel for a plurality of different loads with dependencies.
[0086] Turning now to FIG 8, an embodiment of a method for replacing entries in a load-storage dependency predictor table is shown. For discussion purposes, the steps in this modality are shown in sequential order. It should be noted that, in various embodiments of the method described below, one or more of the described elements may be performed concurrently in a different order than shown, or omitted entirely. Other elements can also be carried out as desired.
[0087] A pointer can point to a group of adjacent entries in the load-storage dependency predictor table and the counter values of the group of adjacent entries can be analyzed (block 160). In one modality, the group can include four inputs. In other modalities, the group can include other numbers of entries. Then the input with the lowest counter value can be selected (block 162). If more than one input has the lowest counter value, then the pointer can randomly select any input, or differentiate inputs with the lowest counter value using another value or measure.
[0088] If a new entry needs to be allocated at this point to a newly trained load-store pair with a dependency (block 164), then the selected entry with the lowest counter value in the group can be discarded and the new entry allocated in its place (block 166). It should be noted that a new load-store pair can be allocated, in response to a redirect, and the discard being signaled, and that the redirect can take place at any point. Therefore, conditional block 164 can be located elsewhere in the flowchart of FIG. 8. After the new input is allocated, the pointer moves to the next group of inputs (block 172). If a new entry does not need to be allocated at this time (conditional block 164) then it can be determined whether the lowest counter value is at zero (conditional block 168).
[0089] If the counter value is zero (conditional block 168) then the pointer can remain in its current position and wait for the input to be allocated (block 170). If the lowest counter value is not zero (conditional block 168) then the pointer can move to the next group of inputs in the predictor (block 172). In one modality, the pointer can wait to move to the next group of entries until the load or store accesses the load-store dependency predictor. In another modality, the pointer can move to the next group of inputs in the next clock cycle. After block 172, the method can go back to block 160 to analyze the entries in the group. The method illustrated in FIG 8 is one possible implementation of a replacement policy, and, in other embodiments, other replacement policies (eg, less recently used) may be used.
[0090] Referring next to FIG 9, a block diagram of one embodiment of a subsystem 180 is shown. As shown, system 180 may represent a chip, circuitry, components, etc. of a desktop computer 190, Notebook 200, tablet computer 210, cell phone 220, or similar. In the illustrated embodiment, system 180 includes at least one IC case 10 (of FIG. 1) coupled to an external memory 182.
[0091] IC 10 is coupled to one or more peripherals 184 and external memory 182. A power supply 186 is also provided to supply voltage to IC 10, as well as one or more voltages supplying memory 182 and/or peripherals 184. In many embodiments, source 186 is a battery (ie a rechargeable battery in a smart phone (smartphone), notebook, or tablet computer). In some modalities, more than one type of IC10 can be included (and also more than one external memory 182).
[0092] Memory 182 can be any type of memory, such as dynamic random access memory (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDR, DDR2, DDR3, etc.) (including versions mobile SDRAMs such as mDDR3, etc. and/or low power versions of SDRAMs such as LPDDR2 etc., Rambus DRAM (RDRAM) static RAM (SRAM) etc. One or more memory devices can be attached to one circuit board to form memory modules, such as single linear memory modules (SIMMs), dual linear memory modules (DIMMs) etc. Alternatively, the devices can be mounted with IC 88 in chip-on-chip mode, mode package-in-package, or multi-chip module mode.
[0093] Peripherals 184 may include any desired circuitry depending on the type of system 180. For example, in one modality, peripherals 184 may include devices for various types of wireless communication, such as wi-fi, Bluetooth, cellular, GPS global positioning system, etc. Peripherals 184 may also include additional storage including RAM memory, solid state memory, or disk memory. Peripherals 184 may include interface devices such as a video screen, including touch sensitive video screens or multi-touch screens, keyboards, or other input devices, microphones, speakers, etc.
[0094] Referring now to FIG 10, an embodiment of a computer readable media block diagram 230 including one or more data structures representative of circuitry included in IC 10 (of FIG 1) is shown. Generally speaking, computer readable media 230 may include any non-transient storage media, such as magnetic and optical media, eg, CD-ROM or DVD-ROM Disk, volatile or non-volatile memory media, such as as RAM (eg SDRAM, RDRAM, SRAM, etc.) ROM, etc., as well as media accessible via transmission media or signals such as electrical, electro-magnetic or digital signals via communication media such as a wireless network and/or uplink.
[0095] Generally, the circuitry data structure(s) on computer readable media 230 may be liquid(s) by a program and used directly or indirectly to produce hardware comprising the circuitry. . For example, the data structure(s) may include one or more behavior level descriptions or data transfer level (RTL) descriptions of hardware functionality in a high design language. level (HDL), such as Verilog or VHDL. The description (or descriptions) can (or can) be read by a synthesis tool, which synthesizes the description to produce one or more netlists, comprising port lists from a library of ports. synthesis. Network lists comprise a set of ports that also represent the functionality of the hardware comprising the circuitry. The network list(s) can then be placed and sent to produce one or more data sets describing the geometric shapes to be applied to the masks. Masks can then be used in various semiconductor fabrication steps to fabricate a semiconductor circuit that matches the circuitry. Alternatively, the data structure(s) on computer-readable media 230 then may be the network list(s) (with or without the synthesis library) or set(s) data as desired. In yet another alternative, the data structures may comprise the result of a schematic program or network list(s) or data set(s) derived therefrom.
[0096] While computer readable media 230 includes representation of IC 10, other embodiments may include representation of any portion or combination of portions of IC10 (eg, LSD predictor 37, LSU-46).
[0097] It should be emphasized that the modalities described above are only non-limiting examples of implementations. Numerous variations and modifications will be apparent to those skilled in the art, once the above specification has been fully appreciated. It is intended that the embodiments be interpreted encompassing all such variations and modifications

权利要求:
Claims (14)
[0001]
1. Processor characterized by the fact that it comprises: - a reordering buffer (64); - one or more reserve stations (70); and - a load-storage dependency predictor (66) coupled to the one or more reserve stations (70), the load-storage dependency predictor (66) comprising: a table (90) with entries, in which each entry comprises: a load identifier, a store identifier, a reorder buffer input number, a predictive indicator strength, and an armed bit (98) used to indicate that a store operation has an identifier that corresponds to a storage identifier of the respective entry was shipped; and circuitry configured to: set an armed bit (98) to a table entry (90) in response to dispatch detection of a store operation having an identifier corresponding to a store identifier stored in said table entry (90 ); in response to detecting that a given load operation has an identifier that corresponds to a load identifier of a given input that has a bit set (98) that is set: predict that the given load operation is dependent on a given operation storage having an identifier that corresponds to a storage identifier of the given entry; incrementing a predictive indicator strength of the given input in response to the determination that data of the given load operation is retrieved from a first location, the first location comprising a storage queue; and decrementing the predictive indicator strength of the given entry in response to determining that the data of the given load operation is retrieved from a second location that is different from the first location, wherein the second location is a cache.
[0002]
2. Processor according to claim 1, characterized in that the predictive indicator strength of each table entry (90) comprises a prediction counter strength (102).
[0003]
3. Processor according to claim 2, characterized in that the load-storage dependency predictor circuitry (66) is further configured to: adjust the armed bit (98) of the given input, in response to detection of that the given store operation has an identifier that corresponds to the store identifier of the given input and the determination that a predictive counter (102) strength of the given input is above a threshold; and storing a reorder buffer input number of the given store operation in the given input.
[0004]
4. Processor according to claim 3, characterized in that a reserve station (70) among one or more reserve stations (70) stores the given loading operation with an identification of the reorder buffer input number of the given entry.
[0005]
5. Processor, according to claim 4, characterized in that, in response to the emission of the given storage operation: the number of reordering buffer of the given storage operation is transmitted; the transmitted reorder buffer number is detected by the reserve station (70), storing the given load operation; and the given load operation is authorized to issue, in response to detecting that the transmitted reorder buffer number of the given store operation matches the identification of the stored reorder buffer entry number with the given load operation.
[0006]
6. Processor according to claim 2, characterized in that each table entry (90) further comprises an expired counter that is decremented in response to the expiration of a programmable time period.
[0007]
7. Processor according to claim 1, characterized in that, in response to the detection of emission of a given storage operation from a reserve station (70), the circuitry is configured to clear an armed bit ( 98) in table (90) that was previously adjusted in response to the dispatch of the given storage operation.
[0008]
8. Load-storage dependency predictor content management method (66), characterized in that it comprises the steps of: maintaining a table (90) with entries, each entry comprising: a load identifier , a store identifier, a reorder buffer input number, a prediction indicator strength, and a bit set (98) used to indicate that a store operation has an identifier that matches a store identifier. of the respective entry, was dispatched; and setting an armed bit (98) to a table entry (90) in response to the dispatch detection of a store operation having an identifier corresponding to a store identifier stored in said table entry (90); in response to detecting that a given load operation has an identifier that corresponds to a load identifier of a given input that has a bit set (98) that is set, predicting that the given load operation is dependent on a given operation. storage having an identifier that corresponds to a storage identifier of the given entry; incrementing (146) a prediction indicator strength of the given input in response to the determination that data of the given load operation is retrieved from a first location, the first location comprising a store queue; and decrementing (148) the prediction indicator strength of the given input in response to the determination that data from the given load operation is retrieved from a second location that is different from the first location, wherein the second location is a cache. .
[0009]
9. Method according to claim 8, characterized in that it further comprises: expediting the given loading operation that has a particular identifier; searching the table (90) for a load identifier that matches the particular identifier; and in response to finding a single input that has a load identifier that corresponds to the particular identifier and a bit set (98) that is set, establishing a dependency between the given load operation and a store operation corresponding to an input number. single-entry reorder buffer buffer.
[0010]
10. Method according to claim 8, characterized in that it further comprises: expediting the given loading operation that has a particular identifier; searching the table (90) for a load identifier that matches the particular identifier; and in response to finding multiple corresponding entries, each corresponding entry having a load identifier that corresponds to the particular identifier and an armed bit (98) that is set, establishing a dependency between the given load operation and multiple load operations.
[0011]
11. Method according to claim 9, characterized in that it further comprises: storing the given loading operation in a reserve station (70) with an identification of the single input reorder buffer input number; transmitting the single-input reordering buffer input number in response to issuing the store operation corresponding to the single-input reordering buffer input number; and allowing the given load operation to output from the reserve station (70) in response to detection of the transmitted reorder buffer entry number.
[0012]
12. Method according to claim 10, characterized in that it further comprises: identifying an older storage operation at each reserve station (70) among a plurality of reserve stations (70) in response to establishing the dependency ; and allowing the given loading operation to issue from a reserve station (70) among the plurality of reserve stations (70) in response to issuing all of the earliest storage operations identified from the respective reserve stations ( 70) among the plurality of reserve stations (70).
[0013]
13. Method according to claim 8, characterized in that each table entry (90) comprises an expired counter that is decremented in response to the expiration of a programmable period of time.
[0014]
14. Method according to claim 10, characterized in that it further comprises allowing the given loading operation to issue from a reserve station (70) in response to the issue of all storage operations older than the given loading operation.

类似技术:

公开号 | 公开日 | 专利标题

BR102013010877B1|2021-07-06|load-store dependency predictor content management method and processor

TWI552069B|2016-10-01|Load-store dependency predictor, processor and method for processing operations in load-store dependency predictor

US9710268B2|2017-07-18|Reducing latency for pointer chasing loads

US9448936B2|2016-09-20|Concurrent store and load operations

US9582276B2|2017-02-28|Processor and method for implementing barrier operation using speculative and architectural color values

JP5799465B2|2015-10-28|Loop buffer learning

BR102013014996B1|2020-12-08|registrar rename processor, method and unit

JP5748800B2|2015-07-15|Loop buffer packing

US8352688B2|2013-01-08|Preventing unintended loss of transactional data in hardware transactional memory systems

US10437595B1|2019-10-08|Load/store dependency predictor optimization for replayed loads

US8856447B2|2014-10-07|Converting memory accesses near barriers into prefetches

US11113065B2|2021-09-07|Speculative instruction wakeup to tolerate draining delay of memory ordering violation check buffers

BRPI0805218A2|2010-08-17|omission scheme of pre-post withdrawal hybrid hardware lock

同族专利:

公开号 | 公开日

US9128725B2|2015-09-08|

TW201403463A|2014-01-16|

JP2013239166A|2013-11-28|

JP2015232902A|2015-12-24|

KR101555166B1|2015-09-22|

BR102013010877A2|2015-06-30|

CN103455309B|2016-12-28|

TW201531939A|2015-08-16|

KR20130124221A|2013-11-13|

TWI529617B|2016-04-11|

EP2660716A1|2013-11-06|

US20130298127A1|2013-11-07|

JP5965041B2|2016-08-03|

CN103455309A|2013-12-18|

WO2013165754A1|2013-11-07|

KR20150075067A|2015-07-02|

EP2660716B1|2016-11-02|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US4521851A|1982-10-13|1985-06-04|Honeywell Information Systems Inc.|Central processor|

US4594660A|1982-10-13|1986-06-10|Honeywell Information Systems Inc.|Collector|

US4860199A|1987-07-31|1989-08-22|Prime Computer, Inc.|Hashing indexer for branch cache|

US5487156A|1989-12-15|1996-01-23|Popescu; Valeri|Processor architecture having independently fetching issuing and updating operations of instructions which are sequentially assigned and stored in order fetched|

US5276825A|1991-03-12|1994-01-04|Chips & Technologies, Inc.|Apparatus for quickly determining actual jump addresses by assuming each instruction of a plurality of fetched instructions is a jump instruction|

US5488729A|1991-05-15|1996-01-30|Ross Technology, Inc.|Central processing unit architecture with symmetric instruction scheduling to achieve multiple instruction launch and execution|

US5440752A|1991-07-08|1995-08-08|Seiko Epson Corporation|Microprocessor architecture with a switch network for data transfer between cache, memory port, and IOU|

DE4237417C2|1992-03-25|1997-01-30|Hewlett Packard Co|Data processing system|

JP3199465B2|1992-07-22|2001-08-20|株式会社日立製作所|Information processing device|

WO1994008287A1|1992-09-29|1994-04-14|Seiko Epson Corporation|System and method for handling load and/or store operations in a superscalar microprocessor|

US5619662A|1992-11-12|1997-04-08|Digital Equipment Corporation|Memory reference tagging|

US5467473A|1993-01-08|1995-11-14|International Business Machines Corporation|Out of order instruction load and store comparison|

GB2281442B|1993-08-24|1997-06-11|News Distribution Limited|Manufacture of wound capacitors|

EP0651331B1|1993-10-18|2002-01-09|National Semiconductor Corporation|A write buffer for a superpipelined, superscalar microprocessor|

US5471598A|1993-10-18|1995-11-28|Cyrix Corporation|Data dependency detection and handling in a microprocessor with write buffer|

US5878245A|1993-10-29|1999-03-02|Advanced Micro Devices, Inc.|High performance load/store functional unit and data cache|

EP0651321B1|1993-10-29|2001-11-14|Advanced Micro Devices, Inc.|Superscalar microprocessors|

US5588126A|1993-12-30|1996-12-24|Intel Corporation|Methods and apparatus for fordwarding buffered store data on an out-of-order execution computer system|

US5465336A|1994-06-30|1995-11-07|International Business Machines Corporation|Fetch and store buffer that enables out-of-order execution of memory instructions in a data processing system|

US6216200B1|1994-10-14|2001-04-10|Mips Technologies, Inc.|Address queue|

US5625789A|1994-10-24|1997-04-29|International Business Machines Corporation|Apparatus for source operand dependendency analyses register renaming and rapid pipeline recovery in a microprocessor that issues and executes multiple instructions out-of-order in a single cycle|

US5666506A|1994-10-24|1997-09-09|International Business Machines Corporation|Apparatus to dynamically control the out-of-order execution of load/store instructions in a processor capable of dispatchng, issuing and executing multiple instructions in a single processor cycle|

US5784586A|1995-02-14|1998-07-21|Fujitsu Limited|Addressing method for executing load instructions out of order with respect to store instructions|

US5802588A|1995-04-12|1998-09-01|Advanced Micro Devices, Inc.|Load/store unit implementing non-blocking loads for a superscalar microprocessor and method of selecting loads in a non-blocking fashion from a load/store buffer|

US5887152A|1995-04-12|1999-03-23|Advanced Micro Devices, Inc.|Load/store unit with multiple oldest outstanding instruction pointers for completing store and load/store miss instructions|

US5832297A|1995-04-12|1998-11-03|Advanced Micro Devices, Inc.|Superscalar microprocessor load/store unit employing a unified buffer and separate pointers for load and store operations|

US5625835A|1995-05-10|1997-04-29|International Business Machines Corporation|Method and apparatus for reordering memory operations in a superscalar or very long instruction word processor|

US5761712A|1995-06-07|1998-06-02|Advanced Micro Devices|Data memory unit and method for storing data into a lockable cache in one clock cycle by previewing the tag array|

US5717883A|1995-06-28|1998-02-10|Digital Equipment Corporation|Method and apparatus for parallel execution of computer programs using information providing for reconstruction of a logical sequential program|

US5652859A|1995-08-17|1997-07-29|Institute For The Development Of Emerging Architectures, L.L.C.|Method and apparatus for handling snoops in multiprocessor caches having internal buffer queues|

US5710902A|1995-09-06|1998-01-20|Intel Corporation|Instruction dependency chain indentifier|

US5751983A|1995-10-03|1998-05-12|Abramson; Jeffrey M.|Out-of-order processor with a memory subsystem which handles speculatively dispatched load operations|

US5835747A|1996-01-26|1998-11-10|Advanced Micro Devices, Inc.|Hierarchical scan logic for out-of-order load/store execution control|

US5754812A|1995-10-06|1998-05-19|Advanced Micro Devices, Inc.|Out-of-order load/store execution control|

US5781790A|1995-12-29|1998-07-14|Intel Corporation|Method and apparatus for performing floating point to integer transfers and vice versa|

US5822559A|1996-01-02|1998-10-13|Advanced Micro Devices, Inc.|Apparatus and method for aligning variable byte-length instructions to a plurality of issue positions|

US5799165A|1996-01-26|1998-08-25|Advanced Micro Devices, Inc.|Out-of-order processing that removes an issued operation from an execution pipeline upon determining that the operation would cause a lengthy pipeline delay|

US5742791A|1996-02-14|1998-04-21|Advanced Micro Devices, Inc.|Apparatus for detecting updates to instructions which are within an instruction processing pipeline of a microprocessor|

US5748978A|1996-05-17|1998-05-05|Advanced Micro Devices, Inc.|Byte queue divided into multiple subqueues for optimizing instruction selection logic|

US5781752A|1996-12-26|1998-07-14|Wisconsin Alumni Research Foundation|Table based data speculation circuit for parallel processing computer|

US6016540A|1997-01-08|2000-01-18|Intel Corporation|Method and apparatus for scheduling instructions in waves|

US5923862A|1997-01-28|1999-07-13|Samsung Electronics Co., Ltd.|Processor that decodes a multi-cycle instruction into single-cycle micro-instructions and schedules execution of the micro-instructions|

US5768555A|1997-02-20|1998-06-16|Advanced Micro Devices, Inc.|Reorder buffer employing last in buffer and last in line bits|

US5996068A|1997-03-26|1999-11-30|Lucent Technologies Inc.|Method and apparatus for renaming registers corresponding to multiple thread identifications|

US6021485A|1997-04-10|2000-02-01|International Business Machines Corporation|Forwarding store instruction result to load instruction with reduced stall or flushing by effective/real data address bytes matching|

US5850533A|1997-06-25|1998-12-15|Sun Microsystems, Inc.|Method for enforcing true dependencies in an out-of-order processor|

US6108770A|1998-06-24|2000-08-22|Digital Equipment Corporation|Method and apparatus for predicting memory dependence using store sets|

US6122727A|1998-08-24|2000-09-19|Advanced Micro Devices, Inc.|Symmetrical instructions queue for high clock frequency scheduling|

US6212622B1|1998-08-24|2001-04-03|Advanced Micro Devices, Inc.|Mechanism for load block on store address generation|

US6212623B1|1998-08-24|2001-04-03|Advanced Micro Devices, Inc.|Universal dependency vector/queue entry|

US6141747A|1998-09-22|2000-10-31|Advanced Micro Devices, Inc.|System for store to load forwarding of individual bytes from separate store buffer entries to form a single load word|

US6658554B1|1999-03-09|2003-12-02|Wisconsin Alumni Res Found|Electronic processor providing direct data transfer between linked data consuming instructions|

US6393536B1|1999-05-18|2002-05-21|Advanced Micro Devices, Inc.|Load/store unit employing last-in-buffer indication for rapid load-hit-store|

US6266744B1|1999-05-18|2001-07-24|Advanced Micro Devices, Inc.|Store to load forwarding using a dependency link file|

US6728867B1|1999-05-21|2004-04-27|Intel Corporation|Method for comparing returned first load data at memory address regardless of conflicting with first load and any instruction executed between first load and check-point|

US6625723B1|1999-07-07|2003-09-23|Intel Corporation|Unified renaming scheme for load and store instructions|

US6481251B1|1999-10-25|2002-11-19|Advanced Micro Devices, Inc.|Store queue number assignment and tracking|

US6523109B1|1999-10-25|2003-02-18|Advanced Micro Devices, Inc.|Store queue multimatch detection|

US6658559B1|1999-12-31|2003-12-02|Intel Corporation|Method and apparatus for advancing load operations|

US6542984B1|2000-01-03|2003-04-01|Advanced Micro Devices, Inc.|Scheduler capable of issuing and reissuing dependency chains|

US6651161B1|2000-01-03|2003-11-18|Advanced Micro Devices, Inc.|Store load forward predictor untraining|

US6622237B1|2000-01-03|2003-09-16|Advanced Micro Devices, Inc.|Store to load forward predictor training using delta tag|

US6694424B1|2000-01-03|2004-02-17|Advanced Micro Devices, Inc.|Store load forward predictor training|

US6502185B1|2000-01-03|2002-12-31|Advanced Micro Devices, Inc.|Pipeline elements which verify predecode information|

US6678807B2|2000-12-21|2004-01-13|Intel Corporation|System and method for multiple store buffer forwarding in a system with a restrictive memory model|

US6571318B1|2001-03-02|2003-05-27|Advanced Micro Devices, Inc.|Stride based prefetcher with confidence counter and dynamic prefetch-ahead mechanism|

JP3729087B2|2001-05-23|2005-12-21|日本電気株式会社|Multiprocessor system, data-dependent speculative execution control device and method thereof|

US20030065909A1|2001-09-28|2003-04-03|Jourdan Stephan J.|Deferral of dependent loads until after execution of colliding stores|

US20030126409A1|2001-12-28|2003-07-03|Toni Juan|Store sets poison propagation|

US6918030B2|2002-01-10|2005-07-12|International Business Machines Corporation|Microprocessor for executing speculative load instructions with retry of speculative load instruction without calling any recovery procedures|

US7062617B2|2002-03-26|2006-06-13|Intel Corporation|Method and apparatus for satisfying load operations|

US7181598B2|2002-05-17|2007-02-20|Intel Corporation|Prediction of load-store dependencies in a processing agent|

US7590830B2|2004-05-28|2009-09-15|Sun Microsystems, Inc.|Method and structure for concurrent branch prediction in a processor|

US7415597B2|2004-09-08|2008-08-19|Advanced Micro Devices, Inc.|Processor with dependence mechanism to predict whether a load is dependent on older store|

US7506105B2|2005-05-02|2009-03-17|Freescale Semiconductor, Inc.|Prefetching using hashed program counter|

US7376817B2|2005-08-10|2008-05-20|P.A. Semi, Inc.|Partial load/store forward prediction|

US7590825B2|2006-03-07|2009-09-15|Intel Corporation|Counter-based memory disambiguation techniques for selectively predicting load/store conflicts|

JP2007293814A|2006-03-28|2007-11-08|Handotai Rikougaku Kenkyu Center:Kk|Processor device and processing method therefor|

US7461238B2|2006-06-07|2008-12-02|International Business Machines Corporation|Simple load and store disambiguation and scheduling at predecode|

GB2445965B|2007-01-26|2011-05-04|Advanced Risc Mach Ltd|Entry replacement within a data store|

US20080209190A1|2007-02-28|2008-08-28|Advanced Micro Devices, Inc.|Parallel prediction of multiple branches|

US8200992B2|2007-09-24|2012-06-12|Cognitive Electronics, Inc.|Parallel processing computer systems with reduced power consumption and methods for providing the same|

US8151084B2|2008-01-23|2012-04-03|Oracle America, Inc.|Using address and non-address information for improved index generation for cache memories|

US8566565B2|2008-07-10|2013-10-22|Via Technologies, Inc.|Microprocessor with multiple operating modes dynamically configurable by a device driver based on currently running applications|

US8285947B2|2009-02-06|2012-10-09|Apple Inc.|Store hit load predictor|

US7975132B2|2009-03-04|2011-07-05|Via Technologies, Inc.|Apparatus and method for fast correct resolution of call and return instructions using multiple call/return stacks in the presence of speculative conditional instruction execution in a pipelined microprocessor|

US8099566B2|2009-05-15|2012-01-17|Oracle America, Inc.|Load/store ordering in a threaded out-of-order processor|

US20100325395A1|2009-06-19|2010-12-23|Doug Burger|Dependence prediction in a memory system|

US8768313B2|2009-08-17|2014-07-01|Digimarc Corporation|Methods and systems for image or audio recognition processing|

US8635428B2|2009-12-09|2014-01-21|Oracle America, Inc.|Preventing duplicate entries in a non-blocking TLB structure that supports multiple page sizes|

US8521992B2|2009-12-22|2013-08-27|International Business Machines Corporation|Predicting and avoiding operand-store-compare hazards in out-of-order microprocessors|

US9128725B2|2012-05-04|2015-09-08|Apple Inc.|Load-store dependency predictor content management|

US9600289B2|2012-05-30|2017-03-21|Apple Inc.|Load-store dependency predictor PC hashing|WO2013100783A1|2011-12-29|2013-07-04|Intel Corporation|Method and system for control signalling in a data path module|

US9128725B2|2012-05-04|2015-09-08|Apple Inc.|Load-store dependency predictor content management|

US9600289B2|2012-05-30|2017-03-21|Apple Inc.|Load-store dependency predictor PC hashing|

US8880829B2|2012-11-19|2014-11-04|Qualcomm Innovation Center, Inc.|Method and apparatus for efficient, low-latency, streaming memory copies|

US10467010B2|2013-03-15|2019-11-05|Intel Corporation|Method and apparatus for nearest potential store tagging|

US9116817B2|2013-05-09|2015-08-25|Apple Inc.|Pointer chasing prediction|

US9619750B2|2013-06-29|2017-04-11|Intel Corporation|Method and apparatus for store dependence prediction|

US10331583B2|2013-09-26|2019-06-25|Intel Corporation|Executing distributed memory operations using processing elements connected by distributed channels|

US9286073B2|2014-01-07|2016-03-15|Samsung Electronics Co., Ltd.|Read-after-write hazard predictor employing confidence and sampling|

US11093401B2|2014-03-11|2021-08-17|Ampere Computing Llc|Hazard prediction for a group of memory access instructions using a buffer associated with branch prediction|

US9710268B2|2014-04-29|2017-07-18|Apple Inc.|Reducing latency for pointer chasing loads|

US10209995B2|2014-10-24|2019-02-19|International Business Machines Corporation|Processor core including pre-issue load-hit-storehazard prediction to reduce rejection of load instructions|

US9645827B2|2014-12-14|2017-05-09|Via Alliance Semiconductor Co., Ltd.|Mechanism to preclude load replays dependent on page walks in an out-of-order processor|

US10120689B2|2014-12-14|2018-11-06|Via Alliance Semiconductor Co., Ltd|Mechanism to preclude load replays dependent on off-die control element access in an out-of-order processor|

US10114646B2|2014-12-14|2018-10-30|Via Alliance Semiconductor Co., Ltd|Programmable load replay precluding mechanism|

KR101822726B1|2014-12-14|2018-01-26|비아 얼라이언스 세미컨덕터 씨오., 엘티디.|Load replay precluding mechanism|

US10108430B2|2014-12-14|2018-10-23|Via Alliance Semiconductor Co., Ltd|Mechanism to preclude load replays dependent on off-die control element access in an out-of-order processor|

KR101819315B1|2014-12-14|2018-01-16|비아 얼라이언스 세미컨덕터 씨오., 엘티디.|Apparatus and method to preclude load replays dependent on write combining memory space access in an out-of-order processor|

US10089112B2|2014-12-14|2018-10-02|Via Alliance Semiconductor Co., Ltd|Mechanism to preclude load replays dependent on fuse array access in an out-of-order processor|

US10108420B2|2014-12-14|2018-10-23|Via Alliance Semiconductor Co., Ltd|Mechanism to preclude load replays dependent on long load cycles in an out-of-order processor|

KR101819316B1|2014-12-14|2018-01-16|비아 얼라이언스 세미컨덕터 씨오., 엘티디.|Mechanism to preclude uncacheabledependent load replays in outoforder processor|

JP6286065B2|2014-12-14|2018-02-28|ヴィアアライアンスセミコンダクターカンパニーリミテッド|Apparatus and method for excluding load replay depending on write-coupled memory area access of out-of-order processor|

WO2016097811A1|2014-12-14|2016-06-23|Via Alliance Semiconductor Co., Ltd.|Mechanism to preclude load replays dependent on fuse array access in out-of-order processor|

US9703359B2|2014-12-14|2017-07-11|Via Alliance Semiconductor Co., Ltd.|Power saving mechanism to reduce load replays in out-of-order processor|

US10088881B2|2014-12-14|2018-10-02|Via Alliance Semiconductor Co., Ltd|Mechanism to preclude I/O-dependent load replays in an out-of-order processor|

US10146539B2|2014-12-14|2018-12-04|Via Alliance Semiconductor Co., Ltd.|Load replay precluding mechanism|

US10175984B2|2014-12-14|2019-01-08|Via Alliance Semiconductor Co., Ltd|Apparatus and method to preclude non-core cache-dependent load replays in an out-of-order processor|

US10146547B2|2014-12-14|2018-12-04|Via Alliance Semiconductor Co., Ltd.|Apparatus and method to preclude non-core cache-dependent load replays in an out-of-order processor|

WO2016097815A1|2014-12-14|2016-06-23|Via Alliance Semiconductor Co., Ltd.|Apparatus and method to preclude x86 special bus cycle load replays in out-of-order processor|

US10083038B2|2014-12-14|2018-09-25|Via Alliance Semiconductor Co., Ltd|Mechanism to preclude load replays dependent on page walks in an out-of-order processor|

US10127046B2|2014-12-14|2018-11-13|Via Alliance Semiconductor Co., Ltd.|Mechanism to preclude uncacheable-dependent load replays in out-of-order processor|

KR101837816B1|2014-12-14|2018-03-12|비아 얼라이언스 세미컨덕터 씨오., 엘티디.|Mechanism to preclude i/odependent load replays in an outoforder processor|

US10108429B2|2014-12-14|2018-10-23|Via Alliance Semiconductor Co., Ltd|Mechanism to preclude shared RAM-dependent load replays in an out-of-order processor|

WO2016097804A1|2014-12-14|2016-06-23|Via Alliance Semiconductor Co., Ltd.|Programmable load replay precluding mechanism|

US9804845B2|2014-12-14|2017-10-31|Via Alliance Semiconductor Co., Ltd.|Apparatus and method to preclude X86 special bus cycle load replays in an out-of-order processor|

US10108421B2|2014-12-14|2018-10-23|Via Alliance Semiconductor Co., Ltd|Mechanism to preclude shared ram-dependent load replays in an out-of-order processor|

US10146540B2|2014-12-14|2018-12-04|Via Alliance Semiconductor Co., Ltd|Apparatus and method to preclude load replays dependent on write combining memory space access in an out-of-order processor|

US10108428B2|2014-12-14|2018-10-23|Via Alliance Semiconductor Co., Ltd|Mechanism to preclude load replays dependent on long load cycles in an out-of-order processor|

US10228944B2|2014-12-14|2019-03-12|Via Alliance Semiconductor Co., Ltd.|Apparatus and method for programmable load replay preclusion|

US10635446B2|2015-09-24|2020-04-28|Qualcomm Incorporated|Reconfiguring execution pipelines of out-of-ordercomputer processors based on phase training and prediction|

CN106557301B|2015-09-25|2019-05-07|上海兆芯集成电路有限公司|Via the multistage firing order allocating method for retaining station structure|

US9606805B1|2015-10-19|2017-03-28|International Business Machines Corporation|Accuracy of operand store compare prediction using confidence counter|

US10514925B1|2016-01-28|2019-12-24|Apple Inc.|Load speculation recovery|

US10437595B1|2016-03-15|2019-10-08|Apple Inc.|Load/store dependency predictor optimization for replayed loads|

US20170286114A1|2016-04-02|2017-10-05|Intel Corporation|Processors, methods, and systems to allocate load and store buffers based on instruction type|

US11106467B2|2016-04-28|2021-08-31|Microsoft Technology Licensing, Llc|Incremental scheduler for out-of-order block ISA processors|

US10740107B2|2016-06-01|2020-08-11|International Business Machines Corporation|Operation of a multi-slice processor implementing load-hit-store handling|

US10067762B2|2016-07-01|2018-09-04|Intel Corporation|Apparatuses, methods, and systems for memory disambiguation|

WO2018034681A1|2016-08-13|2018-02-22|Intel Corporation|Apparatuses, methods, and systems for access synchronization in a shared memory|

US11048506B2|2016-08-19|2021-06-29|Advanced Micro Devices, Inc.|Tracking stores and loads by bypassing load store units|

CN106406822B|2016-09-18|2019-02-15|上海兆芯集成电路有限公司|The processor detected with improved alias queue and memory contention|

US10684859B2|2016-09-19|2020-06-16|Qualcomm Incorporated|Providing memory dependence prediction in block-atomic dataflow architectures|

US20180081806A1|2016-09-22|2018-03-22|Qualcomm Incorporated|Memory violation prediction|

US10402168B2|2016-10-01|2019-09-03|Intel Corporation|Low energy consumption mantissa multiplication for floating point multiply-add operations|

US10416999B2|2016-12-30|2019-09-17|Intel Corporation|Processors, methods, and systems with a configurable spatial accelerator|

US10558575B2|2016-12-30|2020-02-11|Intel Corporation|Processors, methods, and systems with a configurable spatial accelerator|

US10572376B2|2016-12-30|2020-02-25|Intel Corporation|Memory ordering in acceleration hardware|

US10474375B2|2016-12-30|2019-11-12|Intel Corporation|Runtime address disambiguation in acceleration hardware|

US10445451B2|2017-07-01|2019-10-15|Intel Corporation|Processors, methods, and systems for a configurable spatial accelerator with performance, correctness, and power reduction features|

US10515049B1|2017-07-01|2019-12-24|Intel Corporation|Memory circuits and methods for distributed memory hazard detection and error recovery|

US10445234B2|2017-07-01|2019-10-15|Intel Corporation|Processors, methods, and systems for a configurable spatial accelerator with transactional and replay features|

US10515046B2|2017-07-01|2019-12-24|Intel Corporation|Processors, methods, and systems with a configurable spatial accelerator|

US10387319B2|2017-07-01|2019-08-20|Intel Corporation|Processors, methods, and systems for a configurable spatial accelerator with memory system performance, power reduction, and atomics support features|

US10467183B2|2017-07-01|2019-11-05|Intel Corporation|Processors and methods for pipelined runtime services in a spatial array|

US10469397B2|2017-07-01|2019-11-05|Intel Corporation|Processors and methods with configurable network-based dataflow operator circuits|

CN111065408A|2017-07-11|2020-04-24|辉瑞公司|Immunogenic compositions|

US11086816B2|2017-09-28|2021-08-10|Intel Corporation|Processors, methods, and systems for debugging a configurable spatial accelerator|

US10496574B2|2017-09-28|2019-12-03|Intel Corporation|Processors, methods, and systems for a memory fence in a configurable spatial accelerator|

US10445098B2|2017-09-30|2019-10-15|Intel Corporation|Processors and methods for privileged configuration in a spatial array|

US10380063B2|2017-09-30|2019-08-13|Intel Corporation|Processors, methods, and systems with a configurable spatial accelerator having a sequencer dataflow operator|

US10445250B2|2017-12-30|2019-10-15|Intel Corporation|Apparatus, methods, and systems with a configurable spatial accelerator|

US10417175B2|2017-12-30|2019-09-17|Intel Corporation|Apparatus, methods, and systems for memory consistency in a configurable spatial accelerator|

US10565134B2|2017-12-30|2020-02-18|Intel Corporation|Apparatus, methods, and systems for multicast in a configurable spatial accelerator|

US10387311B2|2018-01-11|2019-08-20|International Business Machines Corporation|Cache directory that determines current state of a translation in a microprocessor core cache|

US10564980B2|2018-04-03|2020-02-18|Intel Corporation|Apparatus, methods, and systems for conditional queues in a configurable spatial accelerator|

CN108920191B|2018-06-05|2020-11-20|上海兆芯集成电路有限公司|Processor circuit and operating method thereof|

US11099846B2|2018-06-20|2021-08-24|Advanced Micro Devices, Inc.|Apparatus and method for resynchronization prediction with variable upgrade and downgrade capability|

US11200186B2|2018-06-30|2021-12-14|Intel Corporation|Apparatuses, methods, and systems for operations in a configurable spatial accelerator|

US10853073B2|2018-06-30|2020-12-01|Intel Corporation|Apparatuses, methods, and systems for conditional operations in a configurable spatial accelerator|

US10891240B2|2018-06-30|2021-01-12|Intel Corporation|Apparatus, methods, and systems for low latency communication in a configurable spatial accelerator|

US10459866B1|2018-06-30|2019-10-29|Intel Corporation|Apparatuses, methods, and systems for integrated control and data processing in a configurable spatial accelerator|

US10678724B1|2018-12-29|2020-06-09|Intel Corporation|Apparatuses, methods, and systems for in-network storage in a configurable spatial accelerator|

US11113055B2|2019-03-19|2021-09-07|International Business Machines Corporation|Store instruction to store instruction dependency|

US10929142B2|2019-03-20|2021-02-23|International Business Machines Corporation|Making precise operand-store-compare predictions to avoid false dependencies|

US11243774B2|2019-03-20|2022-02-08|International Business Machines Corporation|Dynamic selection of OSC hazard avoidance mechanism|

US10965536B2|2019-03-30|2021-03-30|Intel Corporation|Methods and apparatus to insert buffers in a dataflow graph|

US11029927B2|2019-03-30|2021-06-08|Intel Corporation|Methods and apparatus to detect and annotate backedges in a dataflow graph|

US10915471B2|2019-03-30|2021-02-09|Intel Corporation|Apparatuses, methods, and systems for memory interface circuit allocation in a configurable spatial accelerator|

US10817291B2|2019-03-30|2020-10-27|Intel Corporation|Apparatuses, methods, and systems for swizzle operations in a configurable spatial accelerator|

US11037050B2|2019-06-29|2021-06-15|Intel Corporation|Apparatuses, methods, and systems for memory interface circuit arbitration in a configurable spatial accelerator|

法律状态:
2015-06-30| B03A| Publication of a patent application or of a certificate of addition of invention [chapter 3.1 patent gazette]|

2018-12-04| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2019-11-19| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2020-09-29| B06A| Notification to applicant to reply to the report for non-patentability or inadequacy of the application [chapter 6.1 patent gazette]|

2021-04-27| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2021-07-06| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 02/05/2013, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US13/464,647|US9128725B2|2012-05-04|2012-05-04|Load-store dependency predictor content management|

US13/464,647|2012-05-04|

[返回顶部]